1 Algoritma

Algoritma is a data science education center based in Jakarta. We organize workshops and training programs to help working professionals and students gain mastery in various data science sub-fields: data visualization, machine learning, data modeling, statistical inference etc. Visit our website for all upcoming workshops.

2 Libraries and Setup

We’ll set-up caching for this notebook given how computationally expensive some of the code we will write can get.

options(width=50)
knitr::opts_chunk$set(cache = F,tidy=TRUE)
options(scipen = 9999)
rm(list=ls())

You will need to use install.packages() to install any packages that are not already downloaded onto your machine. You then load the package into your workspace using the library() function:

library(dplyr)
library(tidyr)
library(ggplot2)
library(plotly)
library(reshape2)
library(leaflet)

3 Data Visualisation Capstone Project

After having learned and explored appropriate techniques on visualizing data, students are required to deploy an interactive dashboard web application using a shiny server which contain any plotting objects such as ggplot and/or leaflet that display useful insights.

3.1 First Objective

Before you making the dashboard, let’s answer this question first to help you creating a dashboard with useful insight.

3.1.1 What

What is the dashboard about?

This question is self explanatory, you should know what is it about, what problem you try to solve with this dashboard, what story you try to tell to your audience.

3.1.2 Who

Who is the user of your dashboard?

Knowing the user of your dashboard is very important. What division or what kind of people using this dashboard. Do you need a detail or more practical dashboard? When your user is on operational level you need a detailed dashboard but when your user is on managerial level you need simple and general dashboard that can convey the insight quickly.

3.1.3 Why

Why you choose that data?

How much your understanding of that data, Is that data can solve your question? Why do you choose that variable, are they really corelated? Important to know why your choose that data so you don’t create a misleading insight which very dangerous.

3.1.4 When

When is the data collected?

Is it still relevant? For example You can’t use the data from 80s to describe how’s the traffic at current date. Since the trend is everchanging so does the answer to your question, can those very old data answer your question? Irrelevant data can create a misleading insight.

3.1.5 Where

Where you put your plot, valuebox, or input etc?

Make a simple layout design, so you have a image how your end product will look like. Is it tidy enough? Easy enough for your user to understand it? Always follow 5 seconds rule. Your dashboard should provide the relevant information in about 5 seconds.

3.1.6 How

How your dashboard answer your question, hypothesis, or problem you try to solve?

Are you using a right plot? A right variable? Always start from your problem, make sure you use a right plot for right problem. For example what plot you use for see your data distribution? Are you using density plot or line plot?

3.2 Rubrics

In addition, students are given the freedom to use their own dataset or past datasets from previous classes. Below are the rubrics for assessment and grading, Students will get the point(s) if they :

3.2.1 Input (reactivity)

  • (2 points) Using min. 2 different input type
  • (2 points) Choosing appropriate input type
  • (2 points) Demonstrating useful input(s)

3.2.2 Tab (paging)

  • (3 points) Using min. 3 page

3.2.3 Render plot

  • (1 points) Using interactive plot
  • (2 points) Using min. 2 plot type
  • (2 points) Choosing the appropriate plot type
  • (2 points) Demonstrating reactivity from the input
  • (2 points) Creating plots that tell a clear story

3.2.4 Deploy

  • (6 points) Successfully deploying to shinyapps.io

3.2.5 User Interface Appearence

  • (2 points) Have tidy page layout
  • (2 points) Have tidy plot layout
  • (1 points) Have appropriate plot tooltip
  • (1 points) Choosing right color scheme

If you achieved all those criteria you will get total 30 points.

Data Visualization Track Rubics

3.3 Reference

Demo from Algoritma :

3.4 Data Source

Some data source reference:

3.5 Tips

  • Set your size. By default, Shiny limits file uploads to 5MB per file. You can modify this limit by using the shiny.maxRequestSize option. For example, adding this to the top of app.R would increase the limit to 200MB.

options(shiny.maxRequestSize=200*1024^2)

  • Use RDS for save your files.

saveRDS()

readRDS()

4 Planning Your Project

You can start to work and plan your project after the briefing end. We expect you can finish answering the first objective today so you can focus on building the shiny dashboard for the rest of the week.

4.1 Estimated Time

Below is our estimated time to finish answering the first objective (5W+1H questions) during the briefing day.

4.2 Example

Below is our example of answering the 5W+1H questions above.

4.2.1 What (ETC: 10 Mins)

I want to show how every country around the world manages its natural resources, shown by the value of Ecological Footprint and Biocapacity of these countries.

A country experiencing Ecological Deficit is indicated by the behavior of that country that imports Biocapacity through trade, liquidation of national ecological assets or emits a lot of carbon dioxide emissions into the air. Meanwhile, countries are said to have Ecological Reserves when Biocapacity (how many natural resources are owned) exceeds Ecological Footprint (how many natural resources are used). Thus, countries that have Ecological Footprint bigger than Biocapacity have the potential to suffer from various ecological impacts such as natural disasters, land damage, loss of biodiversity, and other things that can have negative impacts on the environment and the country’s economy.

For this project, I specifically want to:

  • Increase awareness among people toward the ecological status of their country
  • Show the relation between human development index with the ecological footprint of the country to see if country with higher human resource index would also have higher ecological footprint
  • Provide ecological status and other metrics that can shows ecological activity of each country in the world

4.2.2 Who (ETC: 10 Mins)

This dashboard is created as a medium of education for common people regarding usage and preservation of countries natural resources.

4.2.3 Why and When (ETC: 40 Mins)

The dataset that is suitable for this project is the ecological data acquired from Global Footprint Network (http://data.footprintnetwork.org/). The data was updated on August 12, 2019 so it is still relevant with the current condition.

footprint <- read.csv("data_input/countries.csv")

footprint <- footprint %>%
    mutate(Country = as.character(Country), GDP.per.Capita = as.numeric(gsub("[$,]",
        "", footprint$GDP.per.Capita)), HDI = round(HDI,
        2), Countries.Required = round(Countries.Required,
        2), Biocapacity.Deficit = as.factor(ifelse(Biocapacity.Deficit >
        0, "Reserve", "Deficit"))) %>%
    rename(Status = Biocapacity.Deficit) %>%
    select(-c(Data.Quality)) %>%
    drop_na()

footprint

4.2.4 How (ETC: 30 Mins)

Explain how to achieve each goals or purposes stated on What question.

  • Increase awareness among people toward the ecological status of their country

I will create a plot that shows the Ecological Footprint and Biocapacity for each region.

ef_region <- footprint %>%
    group_by(Region) %>%
    summarize(Ecological.Footprint = sum(Total.Ecological.Footprint)) %>%
    arrange(desc(Ecological.Footprint)) %>%
    mutate(text = paste0("Ecological Footprint: ",
        Ecological.Footprint, " gha"))

ef_reg_plot <- ggplot(ef_region, aes(x = reorder(Region,
    Ecological.Footprint), y = Ecological.Footprint,
    text = text)) + geom_col(aes(fill = Ecological.Footprint),
    show.legend = F) + coord_flip() + labs(title = "Ecological Footprint by Region",
    y = "global hectares (gha)", x = NULL) + scale_y_continuous(limits = c(0,
    150), breaks = seq(0, 150, 25)) + scale_fill_gradient(low = "#F78181",
    high = "#3B0B0B") + theme(plot.title = element_text(face = "bold",
    size = 14, hjust = 0.04), axis.ticks.y = element_blank(),
    panel.background = element_rect(fill = "#ffffff"),
    panel.grid.major.x = element_line(colour = "grey"),
    axis.line.x = element_line(color = "grey"), axis.text = element_text(size = 10,
        colour = "black"))

ggplotly(ef_reg_plot, tooltip = "text")
b_region <- footprint %>%
    group_by(Region) %>%
    summarize(Biocapacity = sum(Total.Biocapacity)) %>%
    arrange(desc(Biocapacity)) %>%
    mutate(text = paste0("Biocapacity: ", Biocapacity,
        " gha"))

b_reg_plot <- ggplot(b_region, aes(x = reorder(Region,
    Biocapacity), y = Biocapacity, text = text)) +
    geom_col(aes(fill = Biocapacity), show.legend = F) +
    coord_flip() + labs(title = "Biocapacity by Region",
    y = "global hectares (gha)", x = NULL) + scale_y_continuous(limits = c(0,
    275), breaks = seq(0, 250, 50)) + scale_fill_gradient(low = "#9AFE2E",
    high = "#0B6121") + theme(plot.title = element_text(face = "bold",
    size = 14, hjust = 0.04), axis.ticks.y = element_blank(),
    panel.background = element_rect(fill = "#ffffff"),
    panel.grid.major.x = element_line(colour = "grey"),
    axis.line.x = element_line(color = "grey"), axis.text = element_text(size = 10,
        colour = "black"))

ggplotly(b_reg_plot, tooltip = "text")
  • Show the relation between human development index with the ecological footprint of the country to see if country with higher human resource index would also have higher ecological footprint

Create a scatterplot between human development index dengan ecological footprint

scat_plot_data <- footprint %>%
    select(Country, Population.millions, GDP.per.Capita,
        HDI, Total.Ecological.Footprint, Status) %>%
    rename(Population.in.millions = Population.millions,
        Human.Development.Index = HDI, Ecological.Footprint = Total.Ecological.Footprint) %>%
    mutate(text = paste0("Country: ", Country, "<br>",
        "HDI: ", Human.Development.Index, "<br>", "Ecological Footprint: ",
        Ecological.Footprint, "<br>", "GDP per Capita: ",
        "$", GDP.per.Capita))

scat_plot <- ggplot(scat_plot_data, aes(x = Human.Development.Index,
    y = Ecological.Footprint, text = text)) + geom_smooth(col = "#61380B",
    size = 0.7) + geom_point(aes(color = Status, size = GDP.per.Capita)) +
    scale_y_continuous(limits = c(0, 18)) + scale_color_manual(values = c("#DF0101",
    "#04B486")) + labs(title = "HDI on Ecological Footprint",
    y = "Ecological Footprint", x = "Human Development Index") +
    theme(plot.title = element_text(face = "bold",
        size = 14, hjust = 0), panel.background = element_rect(fill = "#ffffff"),
        panel.grid.major.x = element_line(colour = "grey"),
        panel.grid.major.y = element_line(colour = "grey"),
        axis.line.x = element_line(color = "grey"),
        axis.line.y = element_line(color = "grey"),
        axis.text = element_text(size = 10, colour = "black"),
        legend.title = element_blank())

ggplotly(scat_plot, tooltip = "text") %>%
    layout(legend = list(orientation = "v", y = 1,
        x = 0))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
  • Provide ecological status and other metrics that can shows ecological activity of each country in the world

Create a leaflet map with relevant information for the popup

leaflet <- footprint %>%
    dplyr::select(-c(2, 6:10, 12:16, 19:20))

shape <- raster::shapefile("data_input/TM_WORLD_BORDERS_SIMPL-0.3.shp")

# prepare data for color
leaf <- leaflet %>%
    mutate(diff = Total.Biocapacity - Total.Ecological.Footprint) %>%
    select(-Population.millions) %>%
    rename(NAME = Country)

# combining data
shape@data <- shape@data %>%
    left_join(leaf, by = "NAME")

# cleaning data
shape@data[shape@data$NAME == "United States", c(12:17)] <- leaf[leaf$NAME ==
    "United States of America", c(2:7)]
shape@data[shape@data$NAME == "Russia", c(12:17)] <- leaf[leaf$NAME ==
    "Russian Federation", c(2:7)]
shape@data[shape@data$NAME == "Venezuela", c(12:17)] <- leaf[leaf$NAME ==
    "Venezuela, Bolivarian Republic of", c(2:7)]
shape@data[shape@data$NAME == "Republic of Moldova",
    c(12:17)] <- leaf[leaf$NAME == "Moldova", c(2:7)]
shape@data[shape@data$NAME == "The former Yugoslav Republic of Macedonia",
    c(12:17)] <- leaf[leaf$NAME == "Macedonia TFYR",
    c(2:7)]
shape@data[shape@data$NAME == "Iran (Islamic Republic of)",
    c(12:17)] <- leaf[leaf$NAME == "Iran, Islamic Republic of",
    c(2:7)]
shape@data[shape@data$NAME == "Democratic Republic of the Congo",
    c(12:17)] <- leaf[leaf$NAME == "Congo, Democratic Republic of",
    c(2:7)]
shape@data[shape@data$NAME == "United Republic of Tanzania",
    c(12:17)] <- leaf[leaf$NAME == "Tanzania, United Republic of",
    c(2:7)]
shape@data[shape@data$NAME == "Burma", c(12:17)] <- leaf[leaf$NAME ==
    "Myanmar", c(2:7)]

# Create a color palette with handmade bins.
library(RColorBrewer)
mybins <- c(-Inf, -5, 0, 5, Inf)
mypalette <- colorBin(palette = "RdYlGn", domain = shape@data$diff,
    na.color = "transparent", bins = mybins)

# prepare label
mytext <- paste(shape@data$NAME) %>%
    lapply(htmltools::HTML)

popup_shape <- paste("<h3><b>", shape@data$NAME, "</b></h3>",
    "Status: ", shape@data$Status, "<br>", "Ecological Footprint: ",
    shape@data$Total.Ecological.Footprint, " gha <br>",
    "Biocapacity: ", shape@data$Total.Biocapacity,
    " gha <br>", "HDI: ", shape@data$HDI, "<br>", "GDP per Capita: ",
    "$", shape@data$GDP.per.Capita, "<br>", sep = "")

m <- leaflet(shape) %>%
    addProviderTiles("Esri.NatGeoWorldMap") %>%
    setView(lat = 10, lng = 0, zoom = 2) %>%
    addPolygons(fillColor = ~mypalette(diff), color = "green",
        dashArray = "3", fillOpacity = 0.6, weight = 1,
        label = mytext, labelOptions = labelOptions(style = list(`font-weight` = "normal",
            padding = "3px 8px"), textsize = "13px",
            direction = "auto"), popup = popup_shape) %>%
    addLegend(pal = mypalette, values = ~diff, opacity = 0.9,
        title = paste("Remaining", "<br>", "Biocapacity (gha)"),
        position = "bottomleft")

m

4.2.5 Where (ETC: 30 Mins)

Determine the skecth or design for the layout of the dashboard.

4.2.5.1 Overview

On this section, the plot and information for each tab/page will be listed. The final shiny dashboard of the following example can be seen at https://nabiilahardini.shinyapps.io/Eco-Status/

Menu 1:

  • Bar chart of ecological footprint
  • Bar chart of biocapacity of each region
  • Scatter plot between Human Development Index (HDI) and the Ecological Footprint

Menu 2:

  • Leaflet map with popup that shows relevant information about the economic and ecological status.
  • Proportion between biocapacity and ecological footprint between countries on the same region

Menu 3:

  • Raw Dataset

4.2.5.2 Detailed Layout

  • Menu 1: Overview

The first row shows the bar chart of ecological footprint and biocapacity of each region

The second row shows the scatter plot between Human Development Index (HDI) and the Ecological Footprint

  • Menu 2:

The first row shows the leaflet map with popup that shows relevant information about the economic and ecological status.

The second row shows the proportion between biocapacity and ecological footprint between countries on the same region

The third row shows bar chart of each natural resource of a country and info box regarding their ecological status

  • Menu 3:

The third tab shows the raw data.